Skip to content

Conversation

@samremes
Copy link

@samremes samremes commented Oct 23, 2025

Motivation

Llama4 Maverick uses a custom routing function that isn't using a softmax but only sigmoid: https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llama4.py#L62-L71
Especially in inference that custom routing function done in torch only becomes a significant overhead.

Technical Details

Relies on a PR in composable_kernel: ROCm/composable_kernel#3062
Need to bump 3rdparty/composable_kernel after merging the CK PR.

Test Plan

Added a simple test into op_tests that test both fp16 and bf16 cases.

Test Result

python3 op_tests/test_moe_topk_sigmoid.py 
[aiter] import [module_aiter_enum] under /workspaces/dev/aiter/aiter/jit/module_aiter_enum.so
[W1023 15:37:48.300574905 collection.cpp:1114] Warning: ROCTracer produced duplicate flow start: 1 (function operator())
[aiter] import [module_moe_asm] under /workspaces/dev/aiter/aiter/jit/module_moe_asm.so
[aiter] [checkAllclose atol=0.01 rtol=0.01 passed~]
[aiter] [checkAllclose atol=0.01 rtol=0.01 passed~]
Runtime (torch baseline):     29.784888888888865
Runtime (fused topk sigmoid): 4.163444444444443
Uplift:                       7.15x
[aiter] [checkAllclose atol=0.01 rtol=0.01 passed~]
[aiter] [checkAllclose atol=0.01 rtol=0.01 passed~]
Runtime (torch baseline):     31.291888888888884
Runtime (fused topk sigmoid): 4.296666666666662
Uplift:                       7.28x

Submission Checklist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant